refactor: framework de-coupling — generic runtime, ASR as use case (v1.1) by aksOps · Pull Request #2 · RandomCodeSpace/asr

aksOps · 2026-05-07T01:58:22Z

Summary

v1.0 (PR #1) made workflow correctness structural — typed tools, schema-validated boundaries, per-session locks. But the runtime itself still hardcoded ASR-incident-management vocabulary: _TERMINAL_TOOL_RULES knew mark_resolved/mark_escalated/notify_oncall by name, runtime/mcp_servers/observability.py lived in the framework dir, Session.id_format returned INC-*. Smoking gun: examples/code_review/ was structurally broken — every code_review session would land at needs_review because the framework didn't recognize its terminal tools.

This PR makes the runtime structurally framework-generic. ASR is now a use case, not the framework.

What changed (4 phases, 12 commits, +3.4k LOC, 927 tests passing)

Phase	Concern	Fix
5 (DECOUPLE-01)	Hardcoded session-id prefix	`FrameworkAppConfig.session_id_prefix` (1-16 chars, validated). Apps configure: `INC` / `REVIEW` / `SES`. 16 boundary tests.
6 (DECOUPLE-02, 03, 06)	Tool names + status vocab in `src/runtime/`	App-registered `OrchestratorConfig.terminal_tools` + `statuses` + `default_terminal_status`. Generic `extract_fields` for team/severity/etc. metadata capture. Parameterized escalate path (`escalate_action_tool_name` + `escalate_action_default_team`). Concept-leak ratchet test enforces zero incident-vocabulary in framework.
7 (DECOUPLE-04)	App MCP servers in framework dir	`runtime/mcp_servers/{observability,remediation,user_context}.py` → `examples/incident_management/mcp_servers/`. Framework discovers via `OrchestratorConfig.mcp_servers: list[str]` dotted paths. Apps own their config-binding contracts (`register(mcp_app, cfg)`). 80 import sites flipped.
8 (DECOUPLE-05, 07)	State schema + e2e validation	`OrchestratorConfig.state_overrides_schema` (dotted-path pydantic class). `IncidentStateOverrides` + `CodeReviewStateOverrides` boundary tests. `tests/test_code_review_e2e.py` walks an actual code_review session and asserts terminal status from code_review's vocabulary, not `needs_review`. Generic `TerminalToolRule.match_args` discriminator added to support code_review's `set_recommendation(recommendation=approve\|request_changes\|comment)` single-tool dispatch. Binary ratchet — `RATCHET_ALLOWLIST` is now empty.

Tests

927 passing, 3 skipped (was 869 at v1.0; +58 net new tests)
Binary concept-leak ratchet: git grep -E 'mark_(resolved|escalated)|notify_oncall|submit_hypothesis|update_incident|apply_fix' src/runtime/ returns zero matches
code_review e2e: 6/6 GREEN — approve / request_changes / comment dispatched correctly via match_args; sys.modules snapshot diff confirms zero incident_management* imports leaked into code_review session
All 4 dist bundles regenerated and AST-valid

What the framework no longer knows

Hardcoded session-id format (INC-*)
Tool names (mark_resolved, mark_escalated, notify_oncall, submit_hypothesis, update_incident, apply_fix)
Status vocabulary (escalated, resolved, needs_review — all app-declared now)
"Team" / "escalation_teams" / "environment" semantics — apps own them via extract_fields
App MCP server module paths — apps declare via OrchestratorConfig.mcp_servers
State-overrides shape — apps register pydantic schemas

These are now structural guarantees enforced by the binary leak ratchet.

Migration impact

incident_management: YAML config now declares terminal_tools, statuses, default_terminal_status, escalate_action_tool_name, mcp_servers, state_overrides_schema. Existing skills, MCP server, and runtime behavior unchanged. End-to-end test suite passes.
code_review: Now structurally functional. YAML declares its own terminal_tools (3 rules dispatching set_recommendation via match_args), statuses (approved/changes_requested/commented/unreviewed), state_overrides_schema. e2e test verifies the genericity invariant.
Future apps: Drop a YAML config + a skills/ dir + an MCP server, declare the registry blocks, ship. No framework changes needed.

Known follow-ups (deferred to v1.2 Hardening)

HARD-01..09: LLM HTTP timeouts, dependency lockfile, pyright gate flip, bare-except cleanup, hardcoded ollama.com fallback URL, ApprovalWatchdog leak, singleton thread-safety, dist staleness CI gate, ui.py zero-coverage. All tracked in .planning/REQUIREMENTS.md v2 section.

Test plan

Full pytest: 927 passed, 3 skipped
Binary ratchet GREEN
dist/* regenerated (4 files: app.py, ui.py, apps/incident-management.py, apps/code-review.py)
code_review e2e GREEN (6/6 cases including cross-app state-override rejection)
ruff clean on touched files
Manual smoke: start a fresh code_review session in the Streamlit UI, walk through review → recommendation → verdict
Manual smoke: incident_management end-to-end via real LLM (verify no regression)

🤖 Generated with Claude Code

Replace hardcoded INC-YYYYMMDD-NNN session-id format with config-driven prefix. Apps declare their prefix via FrameworkAppConfig.session_id_prefix (default "SES", validated 1-16 chars, alphanumeric + hyphens). The prefix threads through SessionStore -> Session.id_format -> repository. Configs: - incident_management.yaml: INC - code_review.runtime.yaml: REVIEW - config.yaml (default): SES (also explicitly INC for asr-default) Validation rejects empty / whitespace / symbols / underscore / >16-chars at config-load time via field_validator. First step toward v1.1 framework de-coupling — runtime no longer hardcodes the incident-management session-id convention. Tests: 869 passed, 3 skipped (16 new parametrized tests in tests/test_session_id_prefix.py covering default, custom prefixes INC/REVIEW/HR/MY-APP/16-char, and invalid inputs). dist/* bundles regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New module src/runtime/terminal_tools.py introduces the type system for the generic terminal-tool registry that replaces the hardcoded _TERMINAL_TOOL_RULES table in orchestrator.py and the _TYPED_TERMINAL_TOOLS frozenset in graph.py (consumed in Wave 2). - TerminalToolRule: tool_name + status + extract_fields dict, all extra="forbid" to reject typos in app YAML at boot (D-06-01). - extract_fields generalises the v1.0 _extract_team(team_keys) positional-tuple lookup into a {dest: [args.X|result.X]} dict so apps name fields whatever fits (team, severity, reviewer) (D-06-02). - StatusDef: name + terminal + kind (Literal of 5 categories), no color or label (UI presentation owned by UIConfig.badges per D-06-05 rejected alternative). The new module is vocabulary-free: zero references to mark_resolved/mark_escalated/notify_oncall/etc, satisfying DECOUPLE-02 for the framework side.

Add three new fields and a cross-field model_validator to OrchestratorConfig so apps can declare their terminal-tool rules and status vocabulary in YAML without writing Python (D-06-03). Fields: - terminal_tools: list[TerminalToolRule] (D-06-01) - statuses: dict[str, StatusDef] (D-06-05) - default_terminal_status: str | None (D-06-06) Cross-field invariants enforced at config-load (raises pydantic.ValidationError): - statuses non-empty => default_terminal_status required - default_terminal_status must reference a declared status - referenced default status must have terminal=True - every terminal_tools[i].status must reference a declared status - empty statuses with non-empty terminal_tools or set default_terminal_status is rejected (config mistake) Bare OrchestratorConfig() still constructs with empty registry — the framework default for unconfigured apps stays valid. Also tightens model_config = {"extra": "forbid"} on OrchestratorConfig so unknown YAML keys fail loudly at boot, matching the extra="forbid" pattern used by sibling configs in this file. Wave 2 (06-02) will consume these fields in _finalize_session_status / _harvest_typed_terminal and ship the same-PR incident_management.yaml registration (D-06-04).

tests/test_status_vocabulary.py exercises the new OrchestratorConfig.{terminal_tools,statuses,default_terminal_status} plus the TerminalToolRule / StatusDef pydantic models. Covers DECOUPLE-03 acceptance: framework rejects unknown statuses and unknown defaults at config-load, not at gateway-eval time. Test coverage: - TerminalToolRule shape: minimal + extract_fields round-trip + extra="forbid" rejects unknown keys (D-06-01, D-06-02) - StatusDef shape: minimal + Literal kind enforcement + extra="forbid" rejects color (D-06-05 — UI presentation leak) - OrchestratorConfig happy path with mixed rules and statuses - bare OrchestratorConfig() still constructs (framework default invariant) - default_terminal_status without statuses rejected - terminal_tools without statuses rejected - default required when statuses non-empty - default must reference a declared status (error names valid keys) - default must be terminal=True (in_progress fails) - terminal_tools[i].status must reference a declared status (error names the index and the offending value, both for index 0 and later indices) - extra="forbid" on OrchestratorConfig survives the new fields All 16 tests pass; full suite at 885 passed (was 869).

Two new test files; both fail until Tasks 2.2/2.3/2.4 land: - tests/test_concept_leak_ratchet.py — DECOUPLE-06 binary ratchet walking src/runtime/ for the 6 forbidden tokens (mark_resolved, mark_escalated, notify_oncall, submit_hypothesis, update_incident, apply_fix). Currently fails because _TERMINAL_TOOL_RULES still lives in orchestrator.py and _TYPED_TERMINAL_TOOLS in graph.py. - tests/test_terminal_tool_registry.py — 9 finalize-path integration tests covering both incident_management and a synthetic code_review-style registration. Currently fails because Orchestrator._infer_terminal_decision / _extract_field are not yet instance methods (D-06-08). RED phase commit per the v1.1 milestone TDD discipline. GREEN follows in the atomic framework-migration commit.

…rized escalate (DECOUPLE-02, DECOUPLE-03, DECOUPLE-06) Migrates `_finalize_session_status` and the typed-terminal harvester to read app-registered rules off `OrchestratorConfig` instead of the v1.0 hardcoded `_TERMINAL_TOOL_RULES` / `_TYPED_TERMINAL_TOOLS` tables. Ships the framework refactor and the matching incident_management YAML registration in a single atomic commit per D-06-04. Decisions cited: - D-06-02: extract_fields lookup syntax preserved (args.X / result.X). - D-06-03: registry lives in app YAML; pydantic-validated at boot. - D-06-04: same-PR atomic commit — incident_management end-to-end flow stays green throughout. - D-06-05: StatusDef.kind drives the escalated_to mirror dispatch rather than hardcoded vocabulary. - D-06-06: default_terminal_status is app-owned (incident_management uses needs_review; code_review uses unreviewed). - D-06-08: `_finalize_session_status` reads `self.cfg.orchestrator` directly via instance method shape. Resolutions applied: - Resolution A (escalate path) — Option B: parameterised on the new `OrchestratorConfig.escalate_action_tool_name` and `escalate_action_default_team` fields. The orchestrator's `resume_session(action="escalate")` looks up the matching rule in `terminal_tools`, drives status assignment from `rule.status`, and pulls extra-field destination key from `rule.extract_fields`. Hardcoded `notify_oncall` / `platform-oncall` / `escalated_to` literals removed from `orchestrator.py`. - Resolution B (ratchet scope) — Option 3: `RATCHET_ALLOWLIST` carries 8 entries with phase-handles (`Phase 7` for `mcp_servers/remediation.py`; `Phase 8` for `terminal_tools.py` docstrings + `storage/event_log.py` docstrings). Companion meta-test asserts each entry still matches a real line — when Phase 7 / 8 ship and the offending file is gone, the meta-test fails and forces stale-entry deletion. Allowlist is shrink-only. New OrchestratorConfig fields: - `patch_tools: list[str]` — tools whose `args.patch` blob the harvester folds into agent confidence/signal/rationale. Avoids a third hardcoded `update_incident` leak by routing the recognition through YAML. - `harvest_terminal_tools: list[str]` — tools the harvester treats as typed-terminal for confidence capture but the orchestrator's finalize path does NOT transition status on. Used for `submit_hypothesis`-style stage-complete signals. - `escalate_action_tool_name: str | None` — MCP tool the orchestrator invokes when a user clicks Escalate. None means no side-effect. - `escalate_action_default_team: str | None` — fallback team when the user did not pick one. Same-PR YAML registration shipped in: - config/incident_management.yaml - config/config.yaml - config/config.yaml.example - config/code_review.runtime.yaml (placeholder, full migration in Phase 8 per D-06-04) Test fixtures updated to provide `cfg` to the finalize-only stub orchestrators (test_finalize_status_inference.py, test_session_lock.py, test_finalize_concurrent.py) and to pass `terminal_tool_names` / `patch_tool_names` to `_harvest_tool_calls_and_patches` / `make_agent_node` (test_harvester_typed.py, test_agent_node.py, test_gate.py). The `cfg` fixture in test_resume.py now mirrors the incident_management YAML registration so the parameterised escalate path flows through unchanged behaviour for legacy callers. Verification: - pytest: 899 passed (vs 870 baseline pre-Phase 6). - ratchet `git grep` — every match is allowlisted and explained. - ruff: clean on every touched file. - All three production YAMLs (config.yaml, config.yaml.example, code_review.runtime.yaml) validate against the new OrchestratorConfig schema. dist/ bundle regen deliberately deferred to Plan 06-03.

… close-out Phase 6 close-out (Generic Terminal-Tool Registry + Status Vocabulary). Final wave: regenerated dist/app.py, dist/apps/incident-management.py, dist/apps/code-review.py from the post-Phase-6 src/runtime/ + examples/ tree. dist/ui.py regenerated to byte-identical content (config-driven UI shell already generic; no v1.1 surface changes). The bundles now embed TerminalToolRule + StatusDef pydantic models, drop the v1.0 _TERMINAL_TOOL_RULES table and _TYPED_TERMINAL_TOOLS frozenset, and carry per-rule extract_fields generalising the v1.0 _extract_team lookup. Phase 6 final acceptance: - git grep leak-ratchet: 9 matches, all allowlisted (2 Phase 7, 7 Phase 8) per RATCHET_ALLOWLIST in test_concept_leak_ratchet.py - pytest -q: 899 passed (Phase 5 baseline 873; +28 from Phase 6: +14 status_vocabulary, +9 terminal_tool_registry, +5 ratchet) - bundle smoke tests: 12 passed - ruff src/runtime/ tests/ examples/: clean Phase 6 totals: 3 plans, 11 tasks, 6 commits across 3 waves (8a249f3, 700b4e1, d7e8b26, 1b43017, 070e15d, this commit). Closes DECOUPLE-02, DECOUPLE-03, DECOUPLE-06. STATE.md and 06-03-SUMMARY.md document the full hand-off to Phase 7 (DECOUPLE-04) + Phase 8 (DECOUPLE-05 + 07) — both files are gitignored per project policy and NOT included in this commit. Ready for /gsd-transition to Phase 7.

…covery (DECOUPLE-04) Phase 7 / DECOUPLE-04 — single atomic commit per D-07-04. Framework no longer hardcodes incident-vocabulary MCP servers; apps declare their per-tool servers via a generic configurable dotted-path list. * D-07-01 — moved (git mv, history preserved): src/runtime/mcp_servers/{observability,remediation,user_context,__init__}.py -> examples/incident_management/mcp_servers/ * D-07-02 — new field OrchestratorConfig.mcp_servers: list[str] (src/runtime/config.py). YAML wiring: config/config.yaml + config/incident_management.yaml + config.yaml.example declare orchestrator.mcp_servers with the 3 dotted paths under examples.incident_management.mcp_servers.* config/code_review.runtime.yaml declares orchestrator.mcp_servers: [] — empty-list branch verified clean (dist/apps/code-review.py contains zero examples.incident_management.mcp_servers refs). * D-07-03 — converted setters to register(mcp_app, cfg) contract: observability.set_environments -> register(mcp_app, cfg) closing over cfg.environments via _make_environment_validator (snapshotted; no module-level mutable list). remediation.set_escalation_teams -> register(mcp_app, cfg) closing over cfg.framework.escalation_teams (or cfg.escalation_teams fallback) into a module-level snapshot tuple. user_context -> no-op register(mcp_app, cfg) for contract uniformity. Chosen contract: two-arg register(mcp_app, cfg). The orchestrator passes mcp_app=None — modules expose their own per-module FastMCP instance composed by the loader; mcp_app exists for future composition needs. * orchestrator.py:357-365 (now ~365-380): replaced the silent-swallow try/except setter calls with a single importlib-driven loop: for module_path in cfg.orchestrator.mcp_servers: mod = importlib.import_module(module_path) reg = getattr(mod, "register", None) if reg is None: raise RuntimeError( f"orchestrator.mcp_servers entry {module_path!r} does " f"not expose a `register(mcp_app, cfg)` callable" ) reg(None, cfg) No more silent except — missing register raises explicitly per the must_haves contract. * Bundler patched: scripts/build_single_file.py RUNTIME_MODULE_ORDER:66-68 (the three hardcoded mcp_servers paths) removed from the framework-only block; the equivalent entries added to INCIDENT_APP_MODULE_ORDER (under EXAMPLES_ROOT) so the per-tool modules ship in dist/apps/incident-management.py without leaking into dist/app.py (framework-only) or dist/apps/code-review.py. * Bundles regenerated: dist/app.py, dist/ui.py, dist/apps/incident-management.py, dist/apps/code-review.py. * Import-site flip: ~60 sites across 19 test files + 4 config YAMLs changed `runtime.mcp_servers.*` -> `examples.incident_management.mcp_servers.*`. `git grep -E 'runtime\\.mcp_servers|runtime/mcp_servers' -- src/ tests/ apps/ scripts/ examples/ config/` returns zero hits. * Ratchet allowlist (tests/test_concept_leak_ratchet.py): removed the two `src/runtime/mcp_servers/remediation.py` entries (`apply_fix`, `notify_oncall`) and the explanatory Phase-7 comment. The `test_allowlist_entries_actually_match` meta-test stays GREEN because the file no longer exists under that path — leaving the entries would have failed the meta-test (stale allowlist). * Test contract migration: tests/test_notify_oncall_team_required.py was importing the now-removed `set_escalation_teams` directly. It now binds the roster via the public `register(mcp_app, cfg)` contract with a SimpleNamespace cfg — same observable behavior, same three test cases. tests/test_resume.py cfg fixture: added `mcp_servers=[...]` to OrchestratorConfig so the orchestrator's new discovery loop binds the escalation roster (otherwise the snapshot would carry over from prior tests, a fixture-completeness bug exposed by the contract change — Rule 3). Verification: * git ls-files src/runtime/mcp_servers/ -> empty * git ls-files examples/incident_management/mcp_servers/ -> 4 files (__init__, observability, remediation, user_context) * full pytest suite: 899 passed (baseline preserved) * ruff check (touched files): All checks passed * dist/apps/incident-management.py: contains the 3 relocated MCP server bodies (verified via grep for FastMCP("observability") etc.) * dist/apps/code-review.py: contains zero examples.incident_management.mcp_servers references — D-07-02 empty-list branch GREEN. Closes DECOUPLE-04 from REQUIREMENTS.md.

…t (DECOUPLE-05, DECOUPLE-07) — v1.1 milestone close Closes milestone v1.1 (Framework De-coupling). Single atomic commit per D-08-04 covering DECOUPLE-05 + DECOUPLE-07 + ratchet binarization. DECOUPLE-05 (D-08-01, D-08-02): * OrchestratorConfig.state_overrides_schema: str | None — dotted path to a pydantic BaseModel subclass (`mod.path:Class` or `mod.path.Class`). * importlib resolution at Orchestrator.create(); bad path raises RuntimeError with the offending path AND class name. Non-BaseModel targets are caught explicitly. * start_session(state_overrides=…) runs cls.model_validate(...) after _coerce_state_overrides and before store.create. None skips validation entirely — D-08-02 backward compatibility. * New IncidentStateOverrides (environment, severity) and CodeReviewStateOverrides (pr_url, repo, base_branch, pr_number). Both extra='forbid' so typos surface at session-start, not at gateway-eval (PVC-06 generalization). * All three runtime YAMLs declare state_overrides_schema. DECOUPLE-07 (D-08-03): * TerminalToolRule.match_args: dict[str, str] = {} — NEW optional argument-value discriminator. Empty default preserves v1.0 single- rule shape; non-empty restricts the rule to tool calls whose args[k] == v for every declared key. Generic feature any app can use (priority, severity, recommendation, …). * code_review.runtime.yaml: 5-status vocab + 3 set_recommendation rules dispatching by args.recommendation -> approved / changes_requested / commented; default_terminal_status: unreviewed. * tests/test_code_review_e2e.py (6 tests): real Orchestrator from YAML + synthesized set_recommendation ToolCall + finalize hook. Asserts code_review-vocabulary status (NOT unreviewed/needs_review), state_overrides validation rejects unknown keys + cross-app shapes, no incident_management modules in sys.modules after the test. Concept-leak ratchet binary closure: * Scrubbed mark_resolved/mark_escalated/notify_oncall/apply_fix from src/runtime/terminal_tools.py:19-20,45-47 and src/runtime/storage/event_log.py:3,37. Replacement uses neutral <terminal_tool> / set_recommendation placeholders. * RATCHET_ALLOWLIST = {} (was 5 entries). * git grep '\b(mark_(resolved|escalated)|notify_oncall|submit_hypothesis|update_incident|apply_fix)\b' src/runtime/ — zero matches. Bundler: * INCIDENT_APP_MODULE_ORDER and CODE_REVIEW_APP_MODULE_ORDER list their app's state.py first. dist/* regenerated; both schema classes embedded; ast.parse() OK. Tests: 927 passed (was 899) — 22 new schema + 6 new e2e. ruff clean on all touched files. RATCHET_ALLOWLIST: {}. Closes DECOUPLE-05 and DECOUPLE-07 — 7/7 v1.1 DECOUPLE-* shipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…_with_schema CI was missing OLLAMA_API_KEY env var, causing _interpolate to KeyError when loading config/config.yaml. Tests should not depend on real secrets — set a placeholder via monkeypatch so the YAML parses without making real Ollama calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ds_with_schema Prior fix only set OLLAMA_API_KEY but config.yaml has 5 `${VAR}` refs: OLLAMA_API_KEY, AZURE_ENDPOINT, AZURE_OPENAI_KEY, EXTERNAL_MCP_URL, EXT_TOKEN. Set placeholders for all of them so _interpolate succeeds in CI without any real secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-07T02:47:26Z

Quality Gate failed

Failed conditions
58.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

aksOps and others added 15 commits May 6, 2026 15:26

checkpoint: pre-yolo 2026-05-06T17:32:43

aa7a6f1

checkpoint: pre-yolo 2026-05-07T01:00:50

5b91d4c

checkpoint: pre-yolo 2026-05-07T01:10:11

4301dee

checkpoint: pre-yolo 2026-05-07T02:33:49

87d37d4

aksOps merged commit 0ff8914 into main May 7, 2026
7 of 8 checks passed

aksOps deleted the refactor/framework-decoupling branch May 14, 2026 06:16

aksOps mentioned this pull request May 16, 2026

feat: v2.0.0-rc3 — fix audit findings (finalizer, state_overrides, idempotency + 6 important) #41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: framework de-coupling — generic runtime, ASR as use case (v1.1)#2

refactor: framework de-coupling — generic runtime, ASR as use case (v1.1)#2
aksOps merged 15 commits into
mainfrom
refactor/framework-decoupling

aksOps commented May 7, 2026

Uh oh!

sonarqubecloud Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aksOps commented May 7, 2026

Summary

What changed (4 phases, 12 commits, +3.4k LOC, 927 tests passing)

Tests

What the framework no longer knows

Migration impact

Known follow-ups (deferred to v1.2 Hardening)

Test plan

Uh oh!

sonarqubecloud Bot commented May 7, 2026

Quality Gate failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant